Salesforce Data Cloud Ingestion from Confluence - Implementation Template
Application details
Technical considerations
- This solution is designed for Confluence Cloud (not server)
- An instance of the Mule application is deployed per Confluence Space and per type of content
- Content from Confluence should generally be provided as "text/html" if available in that format using the
export_view
value - A single Mule endpoint will support various change notifications and event types from Confluence
- The /ping endpoint will make an authenticated request to Confluence
- The Mule application is designed to be stateless
Webhooks
Webhooks must be pre-registered to allow change notifications to invoke the Mule application. A separate webhook must be registered per event type. The list of events supported by Confluence Cloud are: ["Page Archived", "Page Created", "Page Copied", "Page Moved", "Page Removed", "Page Restored", "Page Trashed", "Page Unarchived", "Page Updated",
"Blog Created", "Blog Removed", "Blog Restored", "Blog Trashed", "Blog Updated"]
These events are mapped to the Change type as described below:
Confluence Cloud event type | Change type |
---|---|
Created, Copied, Restored, Unarchived | Create |
Updated | Update |
Removed, Trashed, Archived | Delete |
Moved | Create/Update/Delete (based on newParent spaceId and oldParent spaceId ) |
Change notifications
Confluence sends change notifications at a global level, such as all spaces defined for the account. The Mule application will filter out the change events related to different Confluence Spaces and not to what is configured for each application deployment.
Page moved events
Page moved events are received when a page is unarchived and when a page is moved. These events are processed in the following way:
- When a page is unarchived, two events are generated: page_unarchived
and page_moved
, which means the page has moved from Archive space to Parent space. Both events will be sent to Data Cloud.
- When a page is moved from one space to another, it's processed as follows:
- If the page has moved out of the space that is being monitored by Data Cloud, it's considered a delete. In this case, the page & old parent identifiers are sent to allow Data Cloud to remove it from the UDLO.
- If the page has moved into a space that is being monitored by Data Cloud, it's considered a create. In this case, the page and new parent identifiers are sent to allow Data Cloud to add it to the UDLO.
Activity diagrams
The following activity diagrams illustrate the sequence of processing to ingest the unstructured metadata and its content on-demand.
Initial Load/Full Refresh Synchronous
Initial Load/Full Refresh Asynchronous
Push Notifications
Get Content
Processing logic
The primary handling and orchestration of unstructured metadata ingestion will be implemented in the Salesforce Data Cloud Ingestion from the Confluence Process API. This process is described in more detail in the following sections.
Initial Load/Full Refresh Synchronous
- A user action from Data Cloud initiates the request for a full refresh of the content metadata
- Data Cloud invokes the Mule application without a continuation token to start the process
- Mule application receives the request and will:
- Retrieve the content metadata from Confluence
- Transform the results into the Data Cloud format with a continuation token
- Data Cloud invokes the Mule application in a loop to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token provided in a previous response
Initial Load/Full Refresh Asynchronous
- Mule application receives a request to perform an asynchronous refresh of all metadata and will:
- Retrieve the content metadata from Confluence
- Transform the results into the required format for the ingestion API
- Send the transformed data to the ingestion endpoint
- Mule application loops to handle the pagination and retrieve the metadata until all the metadata content has been retrieved by using the continuation token from Confluence
Push Notifications
- Mule application receives the change notifications via the registered webhooks
- Mule application transforms the notifications and surfaces the updated metadata (all metadata attributes, both altered and unaltered metadata) into Data Cloud
Get Content
- Data Cloud initiates the request to retrieve the content
- Mule application receives the request to retrieve and stream the content from Confluence in HTML format
Success conditions
Upon successful completion, the following conditions will be met:
- All metadata associated with the unstructured content for Pages and Blogs in Confluence is retrieved and processed.
- Notifications are sent about changes to unstructured content metadata for both Pages and Blogs for a particular Confluence Space in real time or as close to real time as possible, ensuring immediate updates.
- Retrieval of content for Pages and Blogs is supported.